Thesaurus Expansion using Similar Wor

نویسنده

  • Yoshimi Suzuki
چکیده

In both written and spoken languages, we sometimes use different words in order to describe the same meaning. For instance, we use “constraint” (seigen) and “restriction” (seiyaku) as the same meaning. This makes text classification and text summarization difficult. In order to deal with this problem, dictionaries especially thesauri are used. However, in technical paper and patent documents, a lot of new words which are not given in the dictionary. In this paper, we propose a method to accurately extract words which are semantically similar to each other. Using this method, we extracted similar word pairs from patent documents. We also expand a thesaurus using the extracted similar words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Japanese Cross-lingual Query Expansion Using Random Indexing of Aligned Bilingual Text Data

Vector space models can be used for extracting semantically similar words from the co-occurrence statistics of words in large text data. In this paper, we report on our NTCIR 2002 experiments using the Random Indexing vector space method for extracting an English-Japanese cross-lingual thesaurus from aligned English-Japanese bilingual data. The crosslingual thesaurus has been used for automatic...

متن کامل

The Exploration and Analysis of Using Multiple Thesaurus Types for Query Expansion in Information Retrieval

This paper proposes the use of multiple thesaurus types for query expansion in information retrieval. Hand-crafted thesaurus, corpus-based co-occurrence-based thesaurus and syntactic-relation-based thesaurus are combined and used as a tool for query expansion. A simple word sense disambiguation is performed to avoid misleading expansion terms. Experiments using TREC-7 collection proved that thi...

متن کامل

Query Expansion using an Automatically Constructed Thesaurus

Our group participated in the Japanese and English Retrieval Subtasks of TCIR-6. Our goal was to evaluate the effectiveness of a thesaurus constructed from patents for invalidity search. To confirm the effectiveness of our thesaurus-based query expansion, we conducted experiments and found that our method can improve upon traditional document retrieval systems.

متن کامل

Assessing the Impact of Thesaurus-Based Expansion Techniques in QA-Centric IR

In this paper, we assess the impact of using thesaurus-based query expansion methods, at the Information Retrieval (IR) stage of a Question Answering (QA) system. We focus on expanding queries for questions regarding actions and events, where verbs have particularly important roles. Two different thesaurus are used: the OpenOffice thesaurus and an automatically generated verb thesaurus. The per...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006